European Journal of Epidemiology — Latest Matching Preprints

1

Mortality prediction by a metabolomics score and health- and lifestyle-related factors combined

Schorr, K.; Rodriguez Girondo, M.; de Groot, L.; Slagboom, P. E.; Beekman, M.

2026-02-03 epidemiology 10.64898/2026.02.01.26345306 medRxiv

Top 0.1%

18.9%

Show abstract

The ageing society and worldwide rise of chronic disease make adequate early identification of at-risk individuals and preventive intervention highly relevant to public health. Molecular indicators of global health have been developed, such as metabolomics-based MetaboHealth. A shortcoming of molecular biomarkers may be their lack of integration of lifestyle and environmental factors relevant for health span. Hence, we explored the MetaboHealth biomarker and a range of health- and lifestyle factors, including plant based diet index, physical activity, alcohol use, smoking, medication use, 25(OH)D status and socioeconomic position and education in a subpopulation (n=35,192, mean age=56 years) from the UK Biobank cohort. We analysed which of these factors associated independently with mortality; which associated with the MetaboHealth score and which of the independent factors improve mortality prediction by MetaboHealth. By applying multivariate Cox regression modelling we found that all factors associated independently with prospective survival, except for physical activity and education level. Sex, smoking and income were most strongly associated with both mortality and the MetaboHealth score. By cross-validation we subsequently assessed contribution of all independent health- and lifestyle-related factors to MetaboHealth-based mortality prediction and computed a weighted score. We found income and medication intake to be the most and diet the least prominently adding contributors. In conclusion, MetaboHealth partly reflects the effect of health- and lifestyle-related factors, while identification of at-risk individuals is improved by the information on income and medication use. Insights in these factors can be attained non-intrusively and may therefore be taken into account in the context of population health management.

2

Educational Inequalities in Well-Being in Later Life in Germany: The Role of Health Behaviours and Health Literacy

Franzese, F.; Bergmann, M.; Burzynska, A.

2026-04-24 epidemiology 10.64898/2026.04.22.26351388 medRxiv

Top 0.1%

12.5%

Show abstract

Socioeconomic inequalities in health and well-being are a major public health concern, particularly in ageing populations. Education is a key determinant shaping multiple aspects of health outcomes. We used cross-sectional data from wave 9 of the German sample (n=4,148) of the Survey of Health, Ageing and Retirement in Europe (SHARE) to test whether formal education is associated with well-being in later adulthood, with health literacy, self-rated health, and preventive health behaviours as possible mediators. Our results showed that education was positively associated with greater well-being, but only via indirect pathways. Specifically, self-rated health, health literacy, and fruit and vegetable consumption mediated the relationship between education and well-being accounting for 54.7, 24.7, and 12.6 percent of the total effect, respectively. In addition, there were significant positive correlations between education and health literacy, as well as high-intensity physical activity, daily fruit and vegetable consumption, more preventive health check-ups, and less smoking. In contrast, alcohol consumption was more common among those with higher levels of education. All health behaviours and health literacy were correlated directly or indirectly (i.e., mediated by health) with well-being. These findings highlight the importance of examining indirect pathways linking education to well-being in later life. Interventions aimed at improving health literacy and promoting healthy behaviours may help reduce educational inequalities in quality of life among older adults.

3

Early life blood pressure, cognitive function and brain aging in mid-to-late life: A synthetic longitudinal cohort analysis

Bustillo, A. J.; Zeki Al Hazzouri, A.; Glymour, M. M.; Kezios, K.

2026-02-26 epidemiology 10.64898/2026.02.24.26346790 medRxiv

Top 0.1%

12.3%

Show abstract

PURPOSEOver 6.9 million Americans above the age of 65 are living with Alzheimers Disease (AD) or related dementias (ADRDs), which are diseases characterized by cognitive decline and structural brain changes associated with accelerated brain aging. Cardiovascular risk factors, in particular hypertension, are well-studied risk factors for AD/ARD. Evidence suggests that the effects of hypertension on cognitive aging may vary by life stage, yet prior studies have focused on the effects of mid- or late-life hypertension or blood pressure, leaving other life stages, including early life, unstudied. However, owing to the logistical complexity of follow-up throughout the life course, cognitive aging cohorts lack early-life blood pressure exposure data and cognitive and brain aging outcome data in mid/late life. When such data are unavailable from any single data source, data fusion methods may be employed to pool two compatible data sources to impute an early-life blood pressure exposure history and produce a synthetic longitudinal cohort in which the associations between early-life blood pressure and mid/late-life cognition and brain aging can be estimated. The purpose of this work is to estimate the association between early-life blood pressure and mid- and late-life cognition and brain aging in a synthetic longitudinal cohort. METHODSWe pooled the Bogalusa Heart Study (BHS) to provide early-life blood pressure data (ages 4-16) and the CARDIA study to provide mid/late-life cognition & brain aging outcome data (ages 58-70) to generate a synthetic longitudinal cohort. Cognition was defined as cognitive domain scores (including executive function, memory, processing speed, and language) calculated by Z-transforming cognitive test scores within each cohort. Global cognition was calculated as the average of these Z-scores. Brain aging was defined using the Spatial Patterns of Atrophy for Recognition of Brain Aging, a measure of age-related brain atrophy using T1-weighted MRI scans. The cohorts overlapped in ages 17-57 for potential matching variables including blood pressure, sociodemographics, and vascular risk factors. Cognition overlapped between ages 41-58. We pooled data by distance-matching many-to-one (BHS to CARDIA) on mediators & confounders of each exposure-disease relationship that overlapped in age of measurement between the two cohorts. These variables included intermediate values of the exposure (blood pressure, ages 17-57), cognition (ages 41-58), in addition to sociodemographic and vascular risk factors. Linear regression models estimated the association between early life blood pressure & cognitive & brain aging outcomes. RESULTSBHS uniquely provided early life blood pressure data (ages 4-16), while CARDIA provided cognitive & brain aging data at ages 58-70. Matching is feasible between the ages of 17-57 on blood pressure, sociodemographics, and vascular risk factors, but 41-57 for cognition. CONCLUSIONSWe our results demonstrate the feasibility & suitability of two US-based cardiovascular cohorts for generating a synthetic lifecourse cohort to estimate early-life blood pressure and its association with mid/late-life cognitive & brain aging outcomes. Future studies should aim to use measures that more closely overlap between both cohorts. Additionally, future studies should interrogate greater spans, such as early life through late life.

4

Subjective Financial Strain and Incident Heart Disease Among US Adults Aged 50 Years or Older

Tharp, D.

2026-02-25 epidemiology 10.64898/2026.02.23.26346937 medRxiv

Top 0.1%

10.0%

Show abstract

BackgroundFinancial strain has been linked to adverse cardiovascular outcomes, yet whether this association persists beyond objective socioeconomic resources remains unclear. We examined associations of financial strain with incident heart disease and all-cause mortality among US adults aged 50 years or older. MethodsProspective cohort study using the Health and Retirement Study (2006-2022). Among 7219 participants completing the Psychosocial Leave-Behind Questionnaire, the exposure was ongoing financial strain (high vs low/none). Incident heart disease was assessed among 4956 participants without baseline cardiovascular disease using cause-specific Cox and Fine-Gray models. All-cause mortality was modeled using sequential Cox regression. ResultsAmong 7219 participants (mean [SD] age, 67.5 [10.6] years; 58.6% female), 1423 (19.7%) reported high financial strain. Financial strain was associated with incident heart disease (cause-specific HR, 1.18; 95% CI, 1.02-1.37; P =.03; 1310 events), corroborated by Fine-Gray models (SHR, 1.16; 95% CI, 1.00-1.34). For all-cause mortality (3466 deaths), financial strain was associated after demographic and clinical adjustment (HR, 1.17; 95% CI, 1.07-1.28) but attenuated after further adjustment for income and wealth (HR, 1.10; 95% CI, 1.00-1.20; P =.051). The mortality association differed by age (interaction P =.001): HR, 1.25 (95% CI, 1.03-1.52) for adults younger than 65 years versus HR, 1.04 (95% CI, 0.94-1.16) for those 65 or older. ConclusionsFinancial strain was associated with incident heart disease independent of socioeconomic resources. The mortality association was attenuated by income and wealth adjustment but remained elevated among preretirement adults. Financial strain may be a clinically accessible marker of cardiovascular risk among working-age adults.

5

Traditional disease risk factors outperform epigenetic clocks as predictors of non-communicable disease morbidity in a middle-aged cohort

Kostiniuk, D.; Szekely, F.; Lyytikäinen, L.-P.; Ciantar, J.; Rajic, S.; Mishra, P. P.; Lehtimäki, T.; Pahkala, K.; Rovio, S.; Mykkänen, J.; Raitakari, O. T.; Raitoharju, E.; Marttila, S.

2026-02-14 molecular biology 10.64898/2026.02.10.705233 medRxiv

Top 0.1%

9.9%

Show abstract

DNA methylation-based epigenetic clocks have been highlighted as promising biomarkers of ageing, and they have been shown to robustly predict morbidity and mortality. However, current literature is lacking a formal analysis of the increased prediction accuracy, or the added value, of the epigenetic clocks over traditional risk factors of common chronic diseases. Here, we have compared the most commonly used epigenetic clocks and traditional risk factors as predictors of incidence of ageing-associated non-communicable chronic disease in a 7-to-9-year follow-up in a middle-aged population cohort (n=1108, aged 34 to 49 years at baseline). In our cohort, a statistical model consisting of a combination of traditional risk factors outperforms any model including an epigenetic clock. The added value of epigenetic clock measurements over simple and affordable traditional risk factors should be clearly established, if epigenetic clocks are to be used in clinical settings or as tools of personal health monitoring.

6

Life Course Socioeconomic Position and health in older adulthood age: A Formal Mediation Analysis in the 1958 British Birth Cohort

Guo, Y.; Pelikh, A.; Ploubidis, G. B.; Goodman, A.

2026-03-25 epidemiology 10.64898/2026.03.23.26349085 medRxiv

Top 0.1%

9.8%

Show abstract

Background Childhood socioeconomic position (SEP) is a key determinant of later life health. Understanding the extent to which adult SEP mediates this association into early old age is important for explaining how health inequalities are propagated across generations and how they might be addressed in later life. To our knowledge, no prospective study has examined whether childhood SEP remains associated with health at the threshold of older age and the extent to which any such association is mediated by adult SEP. Methods We used data from the 1958 British Birth Cohort, a prospective study that has followed participants since birth, drawing on earlier data collected at birth and ages 33 and 55 years and newly collected data from the age 62 sweep. Using interventional causal mediation analyses, we assessed whether adult occupational class, education, housing tenure, and income mediate associations between childhood social class (manual vs non manual) and health at age 62 (self rated health, C reactive protein [CRP], cholesterol ratio, Glycated hemoglobin [HbA1c], and N terminal pro B type natriuretic peptide [NT proBNP]). Findings Associations between childhood SEP and self rated health, CRP, cholesterol ratio, and HbA1c persisted after accounting for adult SEP. Mediation was outcome specific and differed by sex. Among men, occupational class mediated 39% of the association with self rated health (indirect effect RR 0.90, 95% CI 0.86,0.95) and education mediated 27% (0.93, 0.90,0.96). Among women, education mediated 10% (0.95, 0.91,0.98) and housing tenure mediated 6% (0.97, 0.94,0.99). Indirect effects for CRP were smaller, and mediation was minimal for cholesterol ratio, HbA1c, and NT proBNP Interpretation Population level improvements in adult SEP could reduce, but are unlikely to eliminate, later life health inequalities associated with childhood SEP. Reducing these inequalities will require policies that address disadvantage in early life and improve adult financial and employment conditions. Funding UK Economic and Social Research Council

7

Heart Stress, Frailty and Mortality Risk in two prospective cohorts

Huang, Y.; Hao, M.; Jiang, S.; Li, X.; Tang, Y.; Hu, Z.; Wang, X.; Han, L.; Li, Y.; Zhang, H.

2026-01-26 epidemiology 10.64898/2026.01.25.26344776 medRxiv

Top 0.1%

9.3%

Show abstract

ImportanceFrailty is a multisystem syndrome that reflects age-related physiological decline, underscoring the need for more biologically informed risk stratification within frailty assessments. Frailty and heart stress (HS) are individually associated with increased mortality risk, but their combined effects remain practically unexplored. ObjectiveTo evaluate whether the combined exposure to frailty and HS is associated with an increased risk of mortality. Design, Setting, and ParticipantsThis prospective cohort study used data from the US National Health and Nutrition Examination Survey (NHANES) and the Health and Retirement Study (HRS). Participants with complete data on frailty and HS were included. Analyses was performed between May 2025 and October 2025. ExposureFrailty was assessed using three frailty indices (FI) based on self-reported items (FI-Self-report), blood biomarkers (FI-Lab), and their combination (FI-Combined). HS was defined by age-adjusted elevation in N-terminal pro-B-type natriuretic peptide (NT-proBNP) levels. Participants were estimate into four groups according to baseline frailty and HS status. Main Outcomes and MeasuresThe primary outcome was all-cause mortality. Cox proportional hazard models were employed to calculate the hazard ratios (HRs) and 95% confidence intervals (CIs). ResultsA total of 12,252 participants from NHANES (mean age 49.91 years, 52.18% female), and 9,488 participants from HRS (mean age 69.16 years, 58.97% female) were included. Compared with those having neither frailty nor HS, participants with frailty and/or HS showed significantly elevated mortality risk in both cohorts, with HRs ranging from 1.81 to 5.54. The highest mortality risk was observed in participant with both frailty and HS, the HRs were 3.58 (95% CI: 3.20-4.01) for FI Self Report, 3.43 (95% CI: 3.04-3.86) for FI Lab, and 4.15 (95% CI: 3.70-4.67) for FI Combined in NHANES; the corresponding HRs were 5.02 (95% CI: 4.38-5.76), 4.73 (95% CI: 4.13-5.41), and 5.54 (95% CI: 4.84-6.35) in HRS, respectively. Conclusions and RelevanceCo-occurrence of frailty and HS is common, and jointly associated with increased mortality risk in the general population. These findings support integrating HS into frailty assessments to improve mortality risk stratification and guide targeted interventions. Key PointsQuestion: Is the combination of frailty and heart stress (HS) associated with increased mortality risk? Findings: In this prospective cohort study including 12,252 participants from the US National Health and Nutrition Examination Survey (NHANES) and 9,488 participants from the Health and Retirement Study (HRS), participants with frailty and/or HS exhibited higher risk of all-cause mortality. The greatest mortality risk was found among participant with both frailty and HS. Meaning: These findings indicate that co-occurrence of frailty and HS is associated with increased mortality risk, supporting integration of HS into frailty assessment for risk stratification and intervention.

8

Quantifying bias from reverse causation in observational studies of dementia risk factors: A simulation study informed by age-specific reverse Mendelian Randomization

Wang, J.; Ackley, S.; Chen, R.; Kezios, K.; Zeki Al Hazzouri, A.; Blacker, D.; Torres, J. M.; Glymour, M. M.

2026-02-23 epidemiology 10.64898/2026.02.21.26346807 medRxiv

Top 0.1%

8.7%

Show abstract

BackgroundThe long preclinical phase of dementia can bias estimated effects of baseline exposures on dementia incidence. We demonstrate simulations informed by reverse Mendelian randomization (MR) findings to quantify the age-specific magnitude of reverse causation bias in analyses in observational studies of the effects of body mass index (BMI) on dementia. MethodsWe simulated longitudinal trajectories of BMI and dementia risk from ages 45 to 90 years, calibrating to published evidence on age-specific dementia incidence, BMI, and associations of dementia genetic risk with BMI. Under the null that BMI does not influence dementia and an alternative that BMI at any age increases subsequent dementia risk, we simulated hypothetical cohort studies (n=20,000, average 15 years of follow-up), varying age of entry from 45 to 80 years. In each hypothetical cohort, the association of z-standardized BMI at study entry and dementia incidence were estimated using Cox proportional hazards models. Bias was quantified using the ratio of observed to true hazard ratios (RHRs). All scenarios were replicated 500 times. ResultsIn the absence of a causal effect of BMI on dementia, when follow-up began at age 65 years, the RHR was 0.91 (95% CI: 0.90-0.92). When follow-up began at age 80 years, the RHR decreased to 0.68 (95% CI: 0.67-0.69), indicating substantial bias attributable to reverse causation. ConclusionReverse causation, presumably arising from preclinical dementia, can induce substantial bias in estimates of the association between baseline exposures and dementia incidence. Simulations provide a convenient tool to quantify this bias.

9

Cohort Profile Update: Survey of Health, Ageing and Retirement in Europe: Biomarker data for age-related health conditions

Boersch-Supan, M.; Boersch-Supan, A.; Andersen-Ranberg, K.; Borbye-Lorenzen, N.; Cofferen, J.; Deza-Lougovski, Y. I.; Groh, R.; Holmgaard, S.; Horton, H. M.; Kerschner, E.; Minh Kha, T.; Potter, A. J.; Rieckmann, A.; Schmidutz, D.; Louring-Skogstrand, K.; Sun, A.; Weiss, L. M.; Wener, M. H.

2026-01-30 epidemiology 10.64898/2026.01.28.26344911 medRxiv

Top 0.1%

8.5%

Show abstract

SHARE, the "Survey of Health, Ageing and Retirement in Europe", is the largest population-based panel survey among people aged 50+ in 28 European countries and Israel. It investigates health, economic and social circumstances over the life-course to shed light on the challenges of population ageing. From 2004 until 2023, more than 615,000 in-depth interviews with 160,000 respondents have been conducted in nine survey waves. Health is crucial to understand ageing. Gold-standard measures are based on blood. SHARE therefore collected dried blood spot (DBS) samples in 12 countries during Wave 6 in 2015. Approximately 27,200 respondents consented (67%). DBS samples were analysed for overall 21 blood biomarkers in three distinct sets (conventional, cytokine and neurodegenerative biomarkers) between 2017 and 2025. The collection of blood-based health data expands SHAREs socioeconomic focus with epidemiological insights. SHARE has sparked many collaborations since 2004, and we expect the new biomarker data to inspire further collaborative biomarker projects and data sharing.

10

Mapping the Dynamic Interplay of Mental Health and Weight Across Childhood: Data-Driven Explorations Using Causal Discovery

Larsen, T. E.; Lorca, M. H.; Ekstrom, C. T.; Vinding, R.; Bonnelykke, K.; Strandberg-Larsen, K.; Petersen, A. H.

2026-04-17 epidemiology 10.64898/2026.04.16.26350943 medRxiv

Top 0.1%

7.2%

Show abstract

Childhood weight development, especially overweight and obesity, has been associated with mental health, but their dynamic, causal relationships, and whether these differ by sex, remain unclear. We applied causal discovery to data from the Danish National Birth Cohort (n=67,593) spanning six periods from pregnancy to late adolescence and considering 67 variables related to child and parental weight, mental health, lifestyle, and socio-economic factors. We found no statistically significant difference between the causal graphs for boys and girls (P=0.079). The data-driven models found causal influence of childhood weight on subsequent weight status. Mental health pathways were exclusively within or across adjacent periods and centered on early adolescent stress. We examined the interplay between a subset of mental health variables, containing information on externalizing and internalizing problems, and weight, and found no direct causal pathway between the two processes. These findings suggest that observed links between weight and these mental health measures may be attributable to confounding. Our findings demonstrate the value of data-driven causal discovery in large cohort studies and how to test for differences in causal mechanisms across subgroups. Results are available in an interactive application, enabling future research to further explore the interplay between weight and mental health.

11

Frailty progression following severe infections in adults aged 65 years and above in US and England: two matched cohort studies

Asare, K.; Mansfield, K. E.; Gore-Langton, G. R.; Cadogan, S. L.; Barry, E.; Keogh, R.; Lo Re, V.; Rodriguez-Barradas, M. C.; Justice, A. C.; Rentsch, C. T.; Warren-Gash, C.

2026-03-15 epidemiology 10.64898/2026.03.13.26348319 medRxiv

Top 0.1%

6.9%

Show abstract

BackgroundWe investigated frailty progression after severe infections in adults ([≥]65 years) in the US and England. MethodsWe conducted parallel matched cohort studies using: US Veterans Aging Cohort Study (VACS-National, 2008-2019; median age 74 years; 98% male); and English Clinical Practice Research Datalink (2006-2019; median age 76 years; 45% male). Adults hospitalised primarily for infection (i.e., severe infection) were matched in calendar date order to individuals without severe infection on age, sex, care site, and US only, plus race and ethnicity. We measured frailty using VACS Index 2{middle dot}0 (US) and Electronic Frailty Index (eFI; England). We estimated annual conditional mean frailty differences between adults with versus without severe infection using linear regression adjusting for baseline frailty, demographics, lifestyle factors, infection history, and US only, comorbidities. ResultsMean baseline frailty was higher in those with severe infection than those without (US: 57 v 48; England: 0{middle dot}17 v 0{middle dot}12). At Year 1, adjusted mean frailty was higher among adults with severe infections than those without (US: VACS Index +2{middle dot}0, 95% CI 1{middle dot}9-2{middle dot}0; England: eFI +0{middle dot}005, 95% CI 0{middle dot}005-0{middle dot}006). At Years 2-5, adjusted mean frailty remained higher after severe infection; however, compared to Year 1, differences were smaller in US, and larger in England. Effects varied by infection type (strongest for lower respiratory tract infections, meningoencephalitis (UK only), urinary tract infections, and sepsis). InterpretationIndividuals with severe infections had higher frailty at baseline and follow up than those without. Preventing both frailty and infections is important for improving health in older age. FundingWellcome Research in contextO_ST_ABSEvidence before this studyC_ST_ABSWe searched PubMed (inception to October 27, 2025), for published articles evaluating the association between infections and frailty, with no language restrictions. We used the search terms [(infection OR infectious) AND (frailty OR frail)]. We found fifteen observational studies investigating associations between individual infections (including: HIV, cytomegalovirus, SARS-CoV-2, acute respiratory infection, urinary tract infection, and influenza) and frailty in adults. Frailty measures varied: eight studies used Frieds phenotype index, six used versions of the cumulative deficit index (i.e., Edmonton Frail Scale, FRAIL-NH Scale, Hospital Frailty Risk Score, Clinical Frailty Score, Veterans Affairs Frailty Index, Vulnerable Elders Survey-13), and one study used the Timed Up and Go Test. Results from identified studies were mixed, with nearly half (7/15) reporting a positive association between the infection studied and frailty, and the remaining eight finding no evidence of association. In cross-sectional analyses, HIV, SARS-CoV-2, cytomegalovirus, and urinary tract infection, were each associated with higher mean frailty scores or frailty prevalence. In longitudinal analysis, hospitalisation for acute respiratory infection was followed by higher mean hospital frailty risk scores two years post-discharge. SARS-CoV-2 infection was associated with early onset (i.e., higher hazard) of frailty over three years follow-up. However, other studies found no association between HIV, SARS-CoV-2, acute respiratory infection and influenza, and frailty prevalence, incidence, or transition between frailty states. These mixed findings may reflect methodological differences between the studies, including variation in frailty measures, and study limitations. Frailty exists along a continuum of vulnerability, and progression after infection may be an important outcome, yet current evidence is scarce. It remains unclear whether severe infections or different types of infection, are associated with faster frailty deterioration. Similarly, it is uncertain whether post-infection frailty risk varies by pathogen (bacterial, viral, parasitic, fungal), infection type (sepsis, urinary tract infection, skin and soft tissue infection, meningitis/encephalitis, lower respiratory tract, gastroenteritis), or by age, sex, social deprivation, and pre-existing comorbidities. Added value of this studyOur study compared frailty progression over a five-year period between adults aged [≥]65 years with severe infection (hospitalisation primarily due to infection) versus comparators without severe infection. We found higher baseline frailty at severe infection onset than in matched comparators. We saw evidence of increased frailty progression over time in people following severe infections compared to those without, however, these differences were small. We also saw higher risk of worsening frailty progression in older adults and those with dementia. Further, worsening frailty progression varied by infection type (strongest for lower respiratory tract infections, meningoencephalitis (UK only), urinary tract infections, and sepsis). Implications of all the available evidenceOur findings underscore the importance of both frailty and infection prevention in improving health in older age. Additional studies are required to explore other wider life-course influences on frailty, to guide the development of comprehensive preventive strategies.

12

Controlling for confounds in UK Biobank brain imaging data with small subsets of subjects

Radosavljevic, L.; Smith, S.; Nichols, T. E.

2026-03-03 epidemiology 10.64898/2026.03.02.26347455 medRxiv

Top 0.1%

6.5%

Show abstract

The UK Biobank (UKB) Brain Imaging cohort contains data from almost 100,000 subjects and has yielded invaluable understanding of the links between the brain and health outcomes and lifestyles. Much of the understanding of these links has come from exploring the association between Imaging Derived Phenotypes (IDPs) and other variables that are unrelated to brain imaging, so called non-Imaging Derived Phenotypes (nIDPs). When performing analysis of this kind, it is very important to control for well known confounding factors such as age, sex and socio-economic status, as well as confounds which are related to the imaging protocol itself. In previous work, we created a pipeline for constructing imaging confounds for use in statistical inference via a standard multivariate linear regression approach (Alfaro-Almagro et. al. 2021). However, this approach is problematic when the number of confounds exceeds the number of subjects, and is severely underpowered when the number of number of subjects is not much larger than the number of confounds. In this work, we perform a simulation study to evaluate 13 modelling approaches to account for confounds when their number is similar to or exceeds the number of subjects. Based on the simulation results, we recommend a ridge regression based permutation test for low sample sizes (n [≤] 50), a version of de-sparsified LASSO for intermediate sample sizes (50 < n [≤] 500), and multivariate linear regression aided by Principal Component Analysis (PCA) for larger sample sizes (n > 500). We also demonstrate the use of our recommended methodology on a real data example of finding associations between Alzheimers Disease (AD) and IDPs.

13

Simulation-Based Comparison of ControlledInterrupted Time Series (CITS) and Multivariable Regression

ORWA, F. O.; Mutai, C.; Nizeyimana, I.; Mwangi, A.

2026-04-13 health policy 10.64898/2026.04.10.26350670 medRxiv

Top 0.1%

6.4%

Show abstract

When randomized controlled trials are impractical, interrupted time series designs offer a rigorous quasi-experimental approach to assess population level policies. Indeed, in the context of quasi-experimental designs (QEDs), the Interrupted Time Series (ITS) method is commonly thought of as the most robust. But interrupted time series designs are susceptible to serial correlation and confounding by time-varying factors associated with both the intervention and the outcome, which may result in biased inference. Thus, we provide a simulation-based contrast of controlled interrupted time series (CITS) and multivariable regression (multivariable negative binomial regression) for estimation of policy effects in count time series data. These approaches are widely used in policy evaluations, yet their comparative performance in typical population health settings has rarely been examined directly. We tested both approaches within a variety of data generating situations, differing in the series length, intervention effect size, and magnitude of lag-1 autocorrelation. Bias, standard error calibration, confidence interval coverage, mean squared error, and statistical power were assessed for performance. Both methods gave unbiased estimates for moderate and large intervention effects, although bias was more pronounced for small effects, particularly in short series. Although the point estimate performance was similar, inferential properties varied significantly. CITS always had smaller mean squared error, better consistency between model based and empirical standard errors, and confidence interval coverage near the 95% nominal levels over weak to moderate autocorrelation. By contrast, multivariable regression was more sensitive to serial dependence, leading to underestimated standard errors and undercoverage, especially at moderate to high autocorrelation, regardless of Newey-West adjustments. These findings show the benefits of using a concurrent control series and the importance of structurally accounting for serial correlation when studying population level policies with time series data.

14

An AI Agent for Automated Causal Inference in Epidemiology

Liu, H.; Shi, K.; li, A.; Li, X.; Chu, J.; Xue, Y.; Cen, S.; Wang, Y.; Zhang, T.

2026-02-06 epidemiology 10.64898/2026.02.06.26345723 medRxiv

Top 0.1%

6.2%

Show abstract

ObjectiveTo address the inefficiency, subjectivity, and high expertise barrier of traditional epidemiological causal inference, this study designed, developed, and validated an AI-powered agent (EpiCausalX Agent) to automate the end-to-end workflow. It integrates cross-database literature retrieval, intelligent causal reasoning, and Directed Acyclic Graph (DAG) visualization to provide a reliable, accessible tool for researchers. Materials and MethodsBuilt on the LangChain 1.0 framework with a layered design (Agent/Tool/Storage/Utility Layers), the agent uses the DeepSeek V3.2 LLM and ReAct paradigm for dynamic task orchestration. Four specialized tools were integrated including multi-database retrieval with 7 databases, causal inference based on Hills criteria and DAG logic, automated DAG drawing using NetworkX and Matplotlib, and clinical standard query. Performance was validated via unit tests, workflow verification, and usability testing. ResultsThe agent achieved full-process automation. It efficiently retrieves and synthesizes literature, automatically identifies confounders and mediators, and generates standardized interactive DAGs. It produces evidence-based, traceable conclusions aligned with established epidemiological knowledge. Its user-friendly natural language interface enables seamless use by non-technical researchers who complete task initiation quickly without operational confusion. The agent is publicly available on WeChat Mini Program for easy access. ConclusionEpiCausalX Agent advances intelligent, automated epidemiological research. By integrating domain expertise with AI agent technology, it overcomes limitations of manual methods and general LLMs to provide a specialized, verifiable, efficient solution. It has broad applications in observational research, clinical study design, and education to enhance productivity and lower barriers to rigorous causal analysis.

15

Causal analyses using education-health linked data for England: a case study

De Stavola, B. L. L.; Aparicio Castro, a.; Nguyen, V. G.; Lewis, K. M.; Dearden, L.; Harron, K.; Zylbersztejn, A.; Shumway, J.; Gilbert, R.

2026-03-19 health policy 10.64898/2026.03.13.26348340 medRxiv

Top 0.1%

4.7%

Show abstract

IntroductionThis article summarises lessons learnt from the Health Outcomes for young People throughout Education (HOPE) Study and serves as a real world, transferable application for addressing causal questions using administrative data. The HOPE study applied causal methods to analyses of administrative data in Education and Child Health Insights from Linked Data (ECHILD) aimed at studying the effectiveness of provision for special educational needs and disability (SEND) on health and education outcomes. MethodsDefining causal questions regarding the impact of SEND provision required judicious mapping of the question onto the data, leading to the selection of appropriate measures of effect, transparent handling of the data and control of confounding factors to estimate effects. We adopted the target trial emulation framework to guide these steps. Having encountered specific computational challenges in estimating the effects of interest, we simulated data that resembled the HOPE study and used them to practice the implementation of alternative estimation methods and to study impact of some of their assumptions. ResultsThe creation and analysis of the simulated data provided valuable insights. First, we learned the importance of aligning the target of estimation with the causal question at hand. Second, we observed how deviations from assumptions specific to each estimation method can affect results. Third, we highlighted the benefits of employing alternative estimation methods as sensitivity tools that can aid the interpretation of the resulting estimates. Finally, we offer user-friendly code in two programming languages (R and Stata) and accompanying simulated data to facilitate the implementation of these methods for similar causal questions. ConclusionWe recommend users of administrative data to fully specify -and possibly revise- the causal questions they wish to address and to carefully examine and compare assumptions, implementation and results obtained using alternative estimation methods.

16

Using Negative Control Outcomes to Detect Selection Bias in Mendelian Randomization Studies

Gkatzionis, A.; Davey Smith, G.; Tilling, K.

2026-02-01 epidemiology 10.64898/2026.01.30.26345215 medRxiv

Top 0.1%

4.4%

Show abstract

Mendelian randomization is currently mainly implemented through the use of genetic variants as instrumental variables to investigate the causal effect of an exposure on an outcome of interest. Mendelian randomization studies are robust to confounding bias and reverse causation, but they remain susceptible to selection bias; for example, this can happen if the exposure or outcome are associated with selection into the study sample. Negative controls are sometimes used to detect biases (typically due to confounding) in observational studies. Here, we focus specifically on Mendelian randomization analyses and discuss under what conditions a variable can be used as a negative control outcome to detect selection mechanisms that could bias Mendelian randomization estimates. We show that the main requirement is that the negative control outcome relates to confounders of the exposure and outcome. Counter-intuitively, the effect of the negative control on selection is of secondary concern; for example, a variable that does not affect selection can be a valid negative control for an outcome that does. We also investigate under what conditions age and sex can be used as negative control outcomes in Mendelian randomization analyses. In a real-data application, we investigate the pairwise causal relationships between 19 traits, utilizing data from the UK Biobank. Treating biological sex as a negative control outcome, we identify selection bias in analyses involving commonly used traits such as alcohol consumption, body mass index and educational attainment.

17

Longitudinal clustering of health behaviours and their association with multimorbidity: Evidence from Understanding Society (UKHLS)

Suhag, A.; Webb, T. L.; Holmes, J.

2026-02-17 epidemiology 10.64898/2026.02.13.26346295 medRxiv

Top 0.1%

4.4%

Show abstract

BackgroundSmoking, unhealthy nutrition, alcohol consumption, and physical inactivity (SNAP behaviours) are major risk factors for multimorbidity but are often studied in isolation. Using longitudinal data, Suhag et al. identified clusters of older adults (aged [≥]50) with common SNAP behaviour patterns and distinct sociodemographic profiles and multimorbidity prevalence; whether and how these patterns generalise across adulthood remains unclear. AimTo conceptually replicate Suhag et al. across a wider age range using an independent panel study. MethodsWe used data from Waves 7-13 of the UK Household Longitudinal Study, analysing adults (aged [≥]16) participating across all seven waves (n=18,008). Repeated-measures latent class analysis identified clusters of adults with common SNAP behaviours at Waves 7, 9, 11 and 13. Multinomial and binomial logistic regression examined how clusters were associated with sociodemographic characteristics and disease status (six disease groups plus multimorbidity), respectively. FindingsSeven clusters were identified: Overall Low-risk (20% of the sample), Insufficiently active (18%), Poor diet and Insufficiently active (23%), Hazardous and Harmful drinkers (11%), Hazardous drinkers, Insufficiently active and Poor diet (14%), Smokers and Drinkers (5%), and Smokers (9%). Behavioural profiles within clusters were largely stable over time. Associations between clusters and disease outcomes were counterintuitive. The cluster labelled Overall Low-risk on the basis of SNAP behaviours had the highest prevalence of multimorbidity, whereas the Hazardous drinkers, Insufficiently active and Poor diet cluster showed lower prevalence across most conditions. These clusters also differed in sociodemographic composition: the Overall Low-risk cluster comprised mainly older women with lower education and income, while the Hazardous drinkers, Insufficiently active and Poor diet cluster was more likely to comprise individuals in the highest education and income groups. ConclusionCluster-analytic techniques can be used to identify population subgroups with distinct behavioural and disease profiles, underscoring the need to consider risk behaviours in conjunction with sociodemographic context.

18

Comparison of methods for assessing effects of risk factors on disease progression in Mendelian randomization under index event bias

Zhang, L.; Higgins, I. A.; Dai, Q.; Gkatzionis, A.; Quistrebert, J.; Bashir, N.; Dharmalingam, G.; Bhatnagar, P.; Gill, D.; Liu, Y.; Burgess, S.

2026-03-02 epidemiology 10.64898/2026.02.26.26347193 medRxiv

Top 0.1%

4.2%

Show abstract

Mendelian randomization has emerged as a transformative approach for inferring causal relationships between risk factors and disease outcomes. However, applying Mendelian randomization to disease progression - a critical step in validating pharmacological targets - is hampered by index event bias. This form of selection bias occurs because analyses of disease progression are necessarily restricted to individuals who have already experienced the disease event. Here, we present a comprehensive evaluation of statistical methods designed to mitigate index event bias, including inverse-probability weighting, Slope-Hunter, and multivariable methods. We compare the performance of these methods in simulations and applied examples. Inverse-probability weighting methods reduce bias, but require individual-level data and will only fully eliminate bias when the disease event model is correctly specified. Slope-Hunter performed poorly in all simulation scenarios, even when its assumptions were fully satisfied. Multivariable methods worked best when including genetic variants that affect the incident disease event. However, if these genetic variants also affect disease progression directly, then the analysis will suffer from pleiotropy. Hence, if the same biological mechanisms affect disease incidence and progression, then multivariable methods will have little utility. But in such a case, analyses of disease progression are less critical, as conclusions reached from analyses of disease incidence are likely to hold for disease progression. Our findings indicate that no single method is a universal solution to provide reliable results for the investigation of disease progression. Instead, we propose a strategic framework for method selection based on data availability and biological context.

19

11 million days of longitudinal wearable data reveal novel future health insights

Fulda, E. S.; Waxse, B. J.; Goleva, S. B.; Tran, T. C.; Taylor, H. J.; Bailey, C. P.; Wolff-Hughes, D. L.; Mo, H.; Zeng, C.; Keaton, J. M.; Ferrara, T. M.; Topiwala, A.; Doherty, A.; Denny, J. C.

2026-01-30 epidemiology 10.64898/2026.01.29.26344899 medRxiv

Top 0.1%

4.0%

Show abstract

BackgroundInsufficient physical activity (PA) is associated with higher risk of morbidity and premature mortality. Wearable devices offer a scalable, objective measurement of physical activity, but most studies reduce these data to a single activity metric measured over a fixed 7-day period. We compared different wearable-derived phenotyping approaches to understand their impact on activity-disease associations. MethodsWe analyzed 11 million days of Fitbit data from 29,351 participants in the All of Us Research Program, deriving four daily activity metrics (step count, peak 1-min cadence, peak 30-min cadence, and heart rate per step) across five time-windows (1-day, 1-week, 1-month, 6-months, 1-year). We performed phenome-wide analyses on >700 incident and >1,300 prevalent disease outcomes identified from linked electronic health records. FindingsAmong participants with EHR and Fitbit data (mean age 57.3 years, 69% female, 47% with >1 year of Fitbit data), all 20 phenotypes were highly correlated (median Pearson r = 0.71). Longer measurement windows yielded stronger and more stable associations, with 1-year step count associated with 373 prevalent and 37 incident outcomes (versus 231 and 17 for 1-day step count) after Bonferroni-correction, including novel associations with chronic pain syndrome, SARS-CoV-2, and autoimmune disease. Differences between prevalent and incident associations suggest that activity metrics can act as both early markers of disease or risk factors. InterpretationThese findings highlight how large-scale, longitudinal wearable data can advance understanding of health and disease and inform scalable approaches for clinical risk stratification. FundingNational Institutes of Health Intramural Research Program, Wellcome Trust RESEARCH IN CONTEXTO_ST_ABSEvidence before this studyC_ST_ABSLow levels of physical activity relate to numerous health outcomes. However, prior studies are limited by a focus on disease prevalence and by a lack of examination across a broad range of health outcomes. Further, the strength of these associations, depends on how physical activity is measured. Prior work shows that wearable devices capture activity more reliably than self-report surveys and typically yield stronger associations with disease risk. Most wearable-based studies rely on short monitoring windows: often seven days or fewer. To our knowledge, no study has systematically evaluated how the duration of wearable-based phenotyping influences estimates of disease risk. To explore this, we searched PubMed using the terms "wearable phenotyping" AND "disease risk", resulting in 48 articles published between 2016 and 2025. Although some studies compared different wearable-derived phenotypes (e.g., step count vs. sleep duration) or explored how the number of observed days affects data quality, none directly evaluated how the length of the phenotyping period shapes associations with disease risk. Added value of this studyUsing nearly 11 million person-days of Fitbit data from [~]30,000 participants, this study evaluates how four wearable-derived activity metrics, summarized across five time windows, influence estimates of activity-disease associations. We identified over 300 previously unreported associations for any of our four metrics and various health outcomes. Longer phenotyping windows consistently yielded stronger associations than shorter ones, although all windows remained informative. These findings highlight the importance of extended wearable monitoring for robust risk characterization. We further compared incident cases with both prevalent and incident outcomes, illustrating the roles of physical activity as a potentially modifiable risk factor, and an early marker of disease. Implications of all the available evidenceThese findings have two important implications. First, longer periods of wearable data collection improve the accuracy of disease risk estimation and should be considered in the design of epidemiologic studies and in the development of clinical guidelines. Although associations between physical activity and disease were directionally consistent across all time windows, effect sizes varied substantially, an observation with important consequences for public health recommendations. Second, this study represents one of the first large-scale demonstrations of long-term wearable monitoring for real-world risk stratification, marking an important advance toward individualized health assessment and intervention.

20

Economic burden of cancer and cardiovascular disease mortality among working-age Europeans: A lifecycle modelling study

Hanly, P. A.; Ortega-Ortega, M.; Kong, Y.-C.; Cancela, M. D. C.; Soerjomataram, I.

2026-02-24 health economics 10.64898/2026.02.13.26346233 medRxiv

Top 0.1%

3.8%

Show abstract

ObjectivesNon-communicable diseases (NCDs) account for almost 90% of deaths in Europe, yet comparative estimates of the productivity costs associated with premature NCD mortality across diseases and countries remain limited. This study estimates and compares productivity losses attributable to cardiovascular disease (CVD) and cancer mortality among working-age populations across Europe. Population-based data were used to estimate productivity costs for CVD and cancer deaths across 30 European countries. Sex- and age-specific mortality data for 2021 were obtained from the World Health Organization Mortality Database. Economic data, including wages, unemployment rates, and labour force participation rates, were sourced from Eurostat. Productivity losses were valued using a human capital approach incorporating an age-transition lifecycle simulation model that adjusts for lifetime wage trajectories and labour market dynamics. Costs were discounted at 3.5%. Total productivity losses from cancer and CVD mortality in working-age populations were estimated at {euro}195.7 billion, equivalent to 1.24% of European GDP. Cancer accounted for 62.5% ({euro}122.2 billion) of total productivity losses, while CVD accounted for 37.5% ({euro}73.5 billion). Total CVD-related productivity costs exceeded cancer-related costs in Central and Eastern Europe, whereas cancer productivity costs were higher in Western, Northern, and Southern Europe. Mean productivity costs per death were higher for CVD ({euro}219,848; 95% CI 165,241-270,247) than for cancer ({euro}217,744; 95% CI 166,554-273,144). A larger gender gap was observed for CVD mortality, with a male-to-female cost ratio of 2.5 compared with 1.6 for cancer. Productivity losses associated with premature cancer and CVD mortality represent a substantial economic burden across Europe, with pronounced variation by disease, region, and sex. These findings provide comparative, cross-country estimates of the human capital costs associated with major NCD causes of death.